Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
نویسندگان
چکیده
Automatic indexing is one of the important technologies used for Textual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of extracting keywords out of it. The concepts are extracted out of the EDR Electronic Dictionary that provides a concept hierarchy based on hyponym/hypernym relations. An experimental evaluation based on a probabilistic model was performed on a sample of the INSPEC bibliographic database and we present the promising results that were obtained during the evaluation experiments.
منابع مشابه
Document Indexing With a Concept Hierarchy
We discuss the task of selection of the concepts that describe the contents of a given document. We propose to use a large hierarchical concept dictionary (thesaurus) for this task. A statistical method of document indexing driven by such a dictionary is proposed. The problem of handling non-terminal nodes in the hierarchy is discussed. Common sense-complaint methods of automatically assigning ...
متن کاملThematic Annotation: extracting concepts out of documents
Semantic document annotation may be useful for many tasks. In particular, in the framework of the MDM project, topical annotation – i.e. the annotation of document segments with tags identifying the topics discussed in the segments – is used to enhance the retrieval of multimodal meeting records. Indeed, with such an annotation, meeting retrieval can integrate topics in the search criteria offe...
متن کاملDocument Indexing with a Concept Hierarchy Índice de Documentos con una Jerarquía de Conceptos
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semi-automatic translation of the hierarchy into different languag...
متن کاملIndexing with a Concept Hierarchy
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semiautomatic translation of the hierarchy into different language...
متن کاملتأملاتی بر نمایه سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه
Purpose: This paper presents various image indexing techniques and discusses their advantages and limitations. Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...
متن کامل